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PREFACE 

This  is  a  compendium  of  three  separate  articles  on  the  statistical 
analysis  of  tokamak  transport.   The  first  article  is  an  expository 
introduction  to  advanced  statistics  and  scaling  laws.   The  second 
analyzes  two  important  problems  of  tokamak  data  —  colinearity  and 
tokamak  to  tokamak  variation  in  detail.   The  third  article  generalizes 
the  Swamy  random  coefficient  model  to  the  case  of  degenerate  matrices. 
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ADVANCED  STATISTICAL  TECHNIQUES  FOR  TOKAMAK  DATA 

ANALYSIS 

K.S.  Riedel 

Courant  Institute  of  Mathematical  Sciences 

New  York  University 

New  York,  New  York  10012 

Abstract 

The  determination  of  empirical  energy  confinement  scaling  laws, 
using  linear  regression,  involves  a  number  of  statistical  hypothesesis. 
The  assumptions  of  least  squares  analysis  and  possible  breakdowns 
in  these  assumptions  are  discussed.  The  multiplicity  of  scaling  laws, 
which  almost  equally  well  describe  the  tokamak  data,  is  associated 
with  the  lack  of  data  variation  along  certain  data  directions.  A  num- 
ber of  advanced  techniques  and  diagnostics  to  treat  the  irregularities 
associated  with  fusion  data  are  given. 

I.  Standard  Least  Squares  Fit  to  Confinement  Data 

Empirical  scaling  laws  of  energy  confinement  in  tokamaks  have  been  valu- 
able in  providing  a  concise  summary  of  existing  experimental  results  and  a 
standard  to  measure  the  perrformance  of  different  devices.  A  more  contro- 
versial use  of  scaling  laws  is  the  prediction  of  energy  confinement  for  future, 
large  scale  devices  such  as  C.I.T.  and  I.T.E.R.  . 

Typically,  the  scaling  law  is  determined  by  performing  a  least  squares 
regression  to  a  large  set  of  plasma  data.  In  this  comment,  we  discuss  the 
assumptions  which  are  implicit  in  scaling  laws.  We  then  consider  advanced 


statistical  techniques  to  provide  a  more  accurate  analysis  of  existing  data. 
Finally,  we  suggest  which  new  data  will  be  of  the  most  value  in  improving 
the  reliability  of  predictions  of  energy  confinement. 

The  most  common  empirical  scaling  laws  are  geometric  scalings  of  the 
form:  t&  =  Coa^1  R?2  k03  M0*  I0* P&*  Bt  nP* .  Geometric  scaling  laws  are  con- 
venient since  they  are  linear  in  the  logarithms  of  the  macroscopic  plasma 
variables.  We  let  y^  =  lnr^  and  X{  be  a  vector  containing  the  logarithms 
of  certain  physical  quantities  such  as  the  major  and  minor  plasma  radii,  R 
and  a,  the  ellipticity,  «,  the  effective  ion  mass,  M,  the  input  power,  P,  the 
plasma  current,  7P,  the  magnetic  field,  Bt,  and  the  line  average  density,  n. 

We  now  review  the  standard  least  squares  analysis  as  it  is  commonly 
applied  to  scaling  law  studies.  A  linear  relationship  between  the  logarithm  of 
the  energy  confinement  time  and  the  logarithms  of  the  plasma  parameters, z, 
is  assumed: 

yi  =  0Xi   +    Ci 

where  Ci  is  a  Gaussianly  distributed  independent  random  variable.  The  k 
dimensional  free  parameter  vector  /3  is  calculated  by  minimizing  the  sum  of 
squares: 

min0,  £  |k  -  /?'it|2  =  min0,{Y  -  X/3>Y(Y  -  X0>) 

i 

yielding  the  least  squares  estimate: 

0={XtX)-1{XtY) 

where  X  and  Y  are  n  x  k  and  nxl  dimensional  matrices  whose  rows  consist 
of  Xi  and  yl. 


We  note  that  the  least  squares  estimate  is  the  best  possible  estimate 
within  the  class  of  unbiased  linear  estimators. 

An  unbiased  estimator  for  the  variation  of  the  measured  logarithms  of 
the  energy  confinement  time  is 


*2  =  —.-  E  I*  -  ^l2 


n  —  k 
and  the  error  in  the  estimate  of  the  free  parameter  0  is 

cov{0-t)  =  cr2(XtX)-' 

which  decreases  as  1/n.  Since  0  is  a  k  dimensional  vector  the  covariance 
matrix  has  dimension  kxk.  However  if  the  data  covariance  matrix,  XlX 
is  nearly  singular,  then  the  corresponding  directions  of  j3  are  poorly  deter- 
mined. These  directions  are  related  to  the  lack  of  independent  variation  of 
the  "independent"  variables.  This  problem  occurs  often  in  fusion  databases. 
In  predicting  the  energy  confinement  in  a  proposed  new  observation,  u, 
such  as  the  value  in  ITER,  we  must  make  the  distinction  between  the  true 
value  and  the  observed  value.  The  difference  between  the  predicted  value 
using  the  estimated  parameters  0  and  the  observed  value  contains  not  only 
the  uncertainty  from  the  covariance  of  0  but  also  the  individual  uncertainty 
from  Ci.  We  define  the  "leverage",  hi  of  Xi  as  hi  —  Xi(XlX)xi,  which  scales 
as  1/n.  The  predicted  values  for  u  are 


True  :         y(tz)  =  0u±  &^u(XtX)~1u 


Observed:         y(u)  = /3u  ±  ay  1    +  u(XlX)~lu 


Thus  the  average  error  in  our  knowledge  of  the  true  value  of  y(u)  is  propor- 
tional to  the  square  root  of  its  leverage. 

II.  Implicit  Assumptions 

We  now  discuss  the  assumptions  involved  in  the  standard  analysis  [1]. 

1.  The  linear  model  is  correct  and  on/y  a  Unite  number  of  free  parameters 
must  be  determined. 

Clearly  treating  a  complex  physical  system  such  as  a  tokamak  as  a  linear 
function  of  four  to  eight  variables  is  only  an  approximation.  The  log  linear 
scaling  may  be  thought  of  as  the  first  term  in  a  Taylor  series  expansion  of 
an  exact  nonlinear  relationship  for  rg  in  terms  of  a  small  number  of  the 
most  important  plasma  variables.  Higher  order  corrections  may  be  trivially 
included  by  adding  additional  free  parameters  corresponding  to  higher  order 
polynomials  in  x. 

Perhaps  the  most  important  use  of  dimensionless  variables  is  in  defin- 
ing the  domain  of  validity  of  the  Taylor  series  expansion.  Luckily  in  many 
applications  we  are  interested  in  the  seeding  of  energy  confinement  with  size 
and  the  dimensionless  physical  variables  are  varied  only  slightly.  Caution  is 
needed  as  a  small  (30%  )  variation  in  a  dimensionless  variable,  such  as  the 
poloidal  beta  or  edge  q,  may  change  the  physics  more  than  a  fivefold  increase 
in  device  size. 

Also  contained  in  the  assumption  that  the  model  is  correct  is  the  state- 
ment that  other  physical  variables  have  no  significant  impact  on  energy 
confinement.  Usually  the  regression  analysis  ignors  all  variables  related  to 
the  edge  plasma. 

2.  Independence  of  Errors. 


We  assume  the  departures  of  the  individual  discharges  from  the  log- linear 
scaling  are  independent.  In  particular,  we  assume  that  all  tokamaks  have  ex- 
actly the  same  scaling  and  that  all  discrepancies  in  the  log- linear  relationship 
are  due  to  discharge  to  discharge  variation  and  not  tokamak  to  tokamak  vari- 
ation. In  particular  we  assume  that  one  hundred  new  D-IIID  datapoints  are 
equally  likely  to  scale  as  the  last  hundred  D-IIID  points  and  as  any  hundred 
TFTR  points! 

An  alternative  hypothesis  is  that  on  each  tokamak  the  scalings  describe 
perfectly  the  energy  confinement.  The  differences  in  the  scaling  coefficients 
in  different  tokamaks  is  associated  with  many  small  factors.  Estimates  for 
the  true  scaling  coefficients  and  the  variance  associated  with  these  coefficients 
are  constructed  by  averaging  over  all  tokamaks: 

When  both  discharge  to  discharge  and  tokamak  to  tokamak  differences 
exist,  a  generalised  least  squares  problem  must  be  solved2. 
3.    The  variance  of  the  measurement  error,  Ci,  is  a  constant,  independent  of 
the  discharge  number. 

The  correlation  and  nonconstancy  of  the  error,  €i,  may  be  taken  into 
account  by  using  a  general  linear  model  of  multivariate  statistics.  In  general 
linear  models,  an  error  structure,  E  =  cov  (^,£j)  is  postulated  or  estimated. 
The  problem  of  estimating  the  free  parameters  /3  is  reduced  to  the  standard 
case  of  uncorrelated,  constant  variation  e^  by  multiplying  X  and  Y  by  £_1'2. 


If  X!  is  diagonal  this  corresponds  to  minimizing 

Typically  the  error  structure  is  estimated  by  fitting  the  residual  error 
matrix  to  a  model  error  matrix  with  only  a  small  number  of  free  parameters 
and  then  0  is  then  estimated  iteratively  using  the  approximate  error  matrix. 

4.  Expected  value  of  errors  is  zero. 

Statistics  cannot  eliminate  systematic  errors. 

5.  The  regressor  variables  are  measured  without  error. 

The  uncertainty  in  the  isotope  ratioin  mixed  species  discharges  is  proba- 
bly larger  than  the  uncertainty  in  te  .  All  other  global  variables  are  usually 
known  to  good  accuracy.  The  standard  assumption  that  mixed  species  dis- 
charges have  an  isotope  ratio  of  1.5  is  more  a  systematic  error  than  a  random 
error.  Local  transport  analysis  is  marked  by  uncertainties  in  deposited  power 
and  local  profile  gradients. 

6.  The  errors,  e,  have  a  Gaussian  distribution. 

This  final  assumption  is  only  used  in  hypothesis  testing  and  confidence 
intervals  for  finite  size  samples.  (By  the  central  limit  theorem,  all  such  tests 
will  be  asymptotically  valid  as  n  tends  to  infinity.) 
III.  Advanced  Regression  Techniques 
J.  Reduction  of  Variance /Colinearity 

Colinearity  occurs  when  the  matrix  XlX  is  almost  singular.  The  ordinary 
least  squares  estimator,  (3  =  (XtX)~1XtY  can  be  ill  conditioned  since  a  small 
error  in  XlY  will  produce  a  large  error  in  /3.  However  a  change  in  the  value 
of  (3  along  the  direction  of  the  approximate  null  vector  will  only  slightly 
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increase  the  residual  error.  This  near  degeneracy  in  the  scaling  coefficients, 
(3,  explains  why  so  many  scaling  laws  can  explain  the  experimental  data.  We 
now  briefly  describe  two  methods  to  stabilize  the  ill  conditioned  procedure 
of  inverting  XlX. 

la.  Principle  Components  Analysis3 

The  eigenvalues  and  eigenvectors  of  (X*X)~l  are  calculated.  The  small- 
est eigenvalues  correspond  to  the  directions  where  the  data  was  varied  the 
least  and  therefore  the  uncertainty  is  the  highest.  The  principle  compo- 
nents estimator  of  /3  is  calculated  by  projecting  (XlX)  onto  a  subspace  of 
the  eigenvectors  which  varied  sufficiently  and  then  inverting  XlX  in  that 
subspace. 

lb.  Ridge  Regression4 

A  small  multiple,  9,  of  the  identity  is  added  to  X'X  and 


1  v'v 


(3{9)  =  {XlX  +  9I)~lXt\ 

For  small  values  of  6,  only  the  near  null  eigenvalues  of  XlX  are  affected. 
9  introduces  a  biased  error  into  the  analysis  but  reduces  the  large  variation 
about  the  expected  value  of  f3.  For  sufficiently  small  9,  the  stabilizing  effect 
of  removing  the  degeneracy  always  results  in  a  smaller  mean  squared  error. 
Instead  of  setting  some  components  of  0  to  zero,  ridge  regression  reduces  all 
components  of  /3  proportionally  to  the  uncertainty  in  that  component. 

The  ridge  trace  is  the  plot  of  /3(9)  versus  9.  In  many  cases,  the  value 
of  9  which  stabilizes  the  regression  coefficients  is  apparent.  Ridge  regres- 
sion has  the  advantage  over  principle  components  that  it  simultaneously  and 
continuously  reduces  the  importance  of  the  poorly  conditioned  eigenvectors. 


Design  of  experiment  is  the  branch  of  statistics  which  seeks  to  minimize 
the  effects  of  colinearity  by  acquiring  data  in  the  directions  which  corre- 
spond to  the  smallest  eigenvalues  in  XlX.  By  balancing  the  matrix  XlX,  ill 
conditioned  data  directions  are  minimized. 
3.  Stepwise  Regressions 

When  there  are  a  large  number  of  potential  regression  variables,  a  stepwise 
regression  procedure  adds  (subtracts)  the  additional  regression  variable  which 
decreases  (increases)  the  residual  sum  of  squares  the  most  (least).  A  stepwise 
regression  enables  the  physicist  to  try  many  possibilities  in  an  automated 
procedure  before  concentrating  on  the  most  successful  variables. 

3.  Robust  Statistics 

Typically,  the  errors,  e^,  have  a  much  larger  tail  of  "unlikely"  events  than 
a  Gaussian  distribution.  These  statistical  outliers  can  have  a  large  influence 
on  the  determination  of  the  fit  coefficients  since  points  are  weighted  according 
to  their  least  squares  error. 

Robust  statistics  seeks  to  minimize  the  influence  of  outliers  by  minimizing 
a  function  that  grows  more  slowly  than  quadratically.  One  example  is  the 
absolute  value  estimator  where  /3  is  defined  by 

0       i  0       i 

4.  Resampling  Techniques6 

When  the  distribution  of  errors  is  not  Gaussian,  resampling  techniques 
create  an  empirical  distribution  of  errors.  The  basic  assumption  is  that  the 
empirical  distribution  of  errors  is  a  multinominal  distribution  of  the  observed 
errors.  The  empirical  distribution  is  then  created  by  a  Monte  Carlo  sampling 
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of  the  datapoints.  Typically,  the  empirical  distribution  of  points  has  a  much 
larger  tail  than  a  Gaussian.  The  primary  application  of  resampling  tech- 
niques, such  as  the  bootstrap  and  jackknife,  is  to  provide  error  estimates  for 
robust,  non-least  squares  estimators. 

Cross  validation  is  a  related  technique,  where  part  of  the  data  is  withheld 
from  the  initial  regression  and  the  initial  regression  is  used  to  predict  the 
new  values  of  the  energy  confinement.  By  comparing  the  predicted  values 
with  the  new  values,  an  assessment  of  the  reliability  of  the  regression  may 
be  obtained. 
III.  Regression  Diagnostics7 

To  understand  better  which  datapoints  have  a  large  role  in  determining 
the  regression  and  how  they  influence  the  fit  coefficients  a  number  of  regres- 
sion diagnostics  have  been  developed.  These  diagnostics  can  be  put  into  two 
broad  categories.  First,  diagnostics  to  assess  whether  the  residual  errors,  £i, 
are  uncorrelated  and  uniform.  Second,  diagnostics  which  assess  the  extent 
that  one  or  more  datapoints  determine  the  least  square  fit  and  the  resulting 
fit  coefficients.  All  of  the  following  diagnostics  are  available  in  the  statistical 
regression  package  SAS  [1]. 
1.  Residual  Plots 

To  assess  the  goodness  of  fit,  statisticians  have  found  it  useful  to  plot  the 
residual  error,  ti  versus  a)  the  predicted  value  yi  =  (3£i,  b)  the  individual 
independent  variables  Xi  . 

The  residual  plots  enable  the  researcher  a)  to  locate  outlying  datapoints 
where  the  fit  is  bad,  b)  find  clustering  of  the  residual  errors  into  discrete 
groups,  indicative  of  a  correlated  error  structure,  c)  identify  a  heteroscedastic 


error  structure,  i.e.  the  magnitude  of  e^  depends  explicitly  on  the  dependent 
variable  y  or  the  independent  variables  x,  i.e.  aj(yi,x,ci)  where  a  is  the 
free  parameter  vector  which  describes  the  spatial  variation  of  <r2,  d)  identify 
possible  higher  order  (quadratic)  terms  which  should  be  included,  and  e) 
identify  terms  which  have  no  significant  impact  on  the  goodness  of  fit,  except 
possibly  to  fit  one  or  two  outlying  points. 
2.  Partial  Residual  Plots 

To  isolate  the  dependencies  of  y  on  the  j-th  independent  variable  Xj,  y 
and  Xj  are  both  regressed  against  the  other  (K  —  1)  independent  variables. 
The  residual  error  of  the  y  regression  is  then  plotted  against  the  residual 
error  of  the  Xj  regression.  After  the  dependencies  from  the  other  regression 
variables  have  been  projected  out,  a  clearer  picture  of  the  number  of  terms 
necessary  for  fitting  the  dependent  variable. 

4.  Indicator  Variables 

In  regression  plots,  each  datapoint  can  be  represented  by  a  different  sym- 
bol and  color.  The  symbols  and  colors  are  chosen  to  correspond  to  important 
categories  of  data,  which  may  exhibit  different  types  of  behavior.  These  "in- 
dicator" variables  allow  one  to  check  for  the  clustering  of  the  residual  error 
into  discrete  groups. 

5.  Influence  Diagnostic  [7] 

When  one  datapoint,  (xi,yt),  is  deleted  the  following  estimates  for  the 
free  parameters  occur 
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where 

Ci  = 

Vi 

— 

0Xi, 

Pr2(i\  - 

1 

a  (i)  -. 

"(» 

— 

1) 

-k 

and 

BK-fimf  =  5^5  *°  -  („  _  i  _t)(1  _  so 


«« 


yt-y,(i)  =  (|-|(i))xt 


1  -hi 

To  determine  if  these  quantities  are  unusually  large,  we  normalize  them 
with  their  expected  deviation,  defined  using  the  estimator  <r(i)2,  not  a1. 
Corresponding  influence  diagnostics  are  available  for  the  case  when  a  number 
of  points  are  deleted. 

In  analyzing  the  L  mode  energy  confinement  data,  these  influence  diagnos- 
tics have  shown  that  the  ASDEX  data  has  a  strong  influence  in  determining 
the  aspect  ratio  scaling. 
IV.  Specialized  Tools  in  Statistics  [3] 

We  conclude  by  briefly  describing  a  number  of  special  topics  in  statistics 
and  their  applications  in  fusion  physics. 

1.  Discriminant  Analysis 

The  boundary  between  two  groups  of  points  is  determined  by  a  least 
squares  fit.  Potential  fusion  applications  include  L-H  transitions,  density 
limits,  and  disruption  boundaries. 

2.  Clustering  Analysis 

The  data  set  is  grouped  into  discrete  subsets  which  have  distinctly  dif- 
ferent scalings.  Cluster  analysis  determines  new  natural  subdivisions  of  the 
data  into  discrete  classes.  A  potential  fusion  application  is  the  grouping  of 
extremely  high  power  discharges  into  one  or  more  beam-plasma  categories. 
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V.  Improved  Datasets  for  Transport  Studies 

There  axe  four  major  structural  problems  with  existing  tokamak  databases. 
First,  the  number  of  free  parameters  which  are  determined  by  multiple  toka- 
mak comparisons  is  approximately  equal  to  the  number  of  tokamaks. 

A  typical  multitokamak  database  consists  of  about  eight  different  devices. 
From  this  data,  seven  or  eight  scaling  coefficients  plus  the  overall  constant 
are  determined.  Unfortunately,  between  two  and  four  of  the  plasma  variables 
plus  the  overall  constant  are  not  varied  on  the  individual  tokamaks.  The 
problem  of  determining  the  scalings  of  these  intertokamak  variables  is  made 
even  more  difficult  by  the  large  variation  tokamak  to  tokamak  variation  in 
scaling  laws.  Reducing  the  number  of  intertokamak  variables  by  varying 
plasma  crossectional  ellipticity  and  effective  ion  mass  in  each  tokamak  will 
greatly  improve  the  reliablity  of  the  transport  analysis. 

Second,  important  physical  variables,  which  could  have  a  significant  im- 
pact on  energy  confinement,  have  been  neglected.  These  include  the  vari- 
ables which  define  the  edge  plasma  such  as  wall  material,  relative  neutral 
penetration  depth  and  limiter/divertor  configuration.  Clearly,  detailed  sin- 
gle machine  studies  of  edge  effects  on  transport  would  help  to  define  which 
new  physical  variables  should  be  included. 

Third,  the  design  matrix  for  tokamak  databases  is  very  colinear.  Conduct- 
ing experiments  having  significant  variation  in  the  data  directions  which  have 
not  been  varied  can  greatly  reduce  the  illconditioning  of  the  linear  regres- 
sion. For  tokamak  scaling  laws5,  the  largest  uncertainty  is  in  the  geometric 
direction  of  simultaneously  increasing  aspect  ratio  and  the  ellipticity  of  the 
plasma  crosssection. 
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Finally,  the  variation  of  the  scaling  coefRents  from  device  to  device  may 
have  systematic  trends  or  a  highly  nonGaussian  random  nature.  The  most 
important  systematic  trend  is  a  decrease  in  the  value  of  current  scaling  co- 
efhent,  /35,  with  device  size.  If  /35  does  decrease  with  device  size  the  impact 
on  C.I.T.  and  I.T.E.R.  performance  is  large.  The  nonGaussian  nature  of  de- 
vice performance  is  indicated  by  some  tokamak  such  as  Dili  and  JET  having 
good  performance  with  nearly  identical  scalings  while  other  tokamaks  have 
scalings  which  differ  greatly  from  this  typical  scaling.  Physics  studies  could 
elucidate  the  reason  for  this  difference  in  performance.  In  the  meantime,  we 
are  left  with  the  choice  of  either  including  only  the  typical  tokamaks  or  all 
tokamaks  in  our  statistical  analysis. 
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Abstract 

The  large  multiplicity  of  scaling  laws  which  almost  fit  the  data 
is  associated  with  the  near  degeneracy  of  the  data  covariance  ma- 
trix. The  traditional  analysis  assumes  that  only  discharge  to  discharge 
variations  are  important.  Using  Swamy's  random  coefficient  model, 
tokamak  to  tokamak  variations  are  included  in  the  statistical  analysis. 

Scaling  laws  for  energy  confinement  are  among  the  oldest  and  most  widely 
used  tools  in  studying  tokamak  transport.  In  this  letter,  we  present  two  sta- 
tistical techniques  which  substantially  enhance  the  reliability  of  the  standard 
analysis.1 

Traditionally,  one  assumes  that  the  energy  confinement  time,  te  scales 
as  te  =  coa0iR<!hK03M0',I0*P0*Btli7n&*.  By  defining  y,  to  be  \nrEi  and  x,  to 
contain  the  logarithms  of  the  global  plasma  variables,  the  analysis  is  reduced 
to  the  solution  of  an  overdetermined  system  of  equations: 

yi^xrf+ei  (1) 

where  (3  is  a  m  dimensional  vector  of  free  parameters  and  e^  is  the  error 
associated  with  the  i-th  datapoint.  The  matrix  whose  rows  consist  of  x^  is 
called  the  design  matrix  and  is  denoted  by  X. 
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To  proceed  further,  assumptions  on  the  co variance  structure  of  the  errors, 
e^,  must  be  made.  We  assume  the  error  matrix,  E,  is  given,  where  cov(e^ej)  = 

The  parameter  vector,  /5,  can  now  be  determined  using  generalized  least 
squares  (GLS).  Namely,  (9  is  determined  by  minimizing 

nunty-X/^E-^-X/?).  (2) 

The  GLS  estimator  is 

/3  =  (X£S-1X)-1XtE-1y  (3) 

and  is  the  best  unbiased  estimator  within  the  class  of  linear  estimators  of  /3. 
Typically,1  the  error  matrix  is  assumed  to  be  a  multiple  of  the  identity 
matrix,  which  corresponds  to  assuming  the  departures  of  the  observed  values, 
y,  from  the  linear  scaling  are  statistically  independent.  Thus  device  to  device 
differences  are  neglected  in  favor  of  discharge  to  discharge  differences. 

II.  Colinearity  and  the  Multiplicity  of  Scaling  Laws 

We  now  distinguish  two  types  of  independent  variables.  Within  device 
variables  are  plasma  parameters  which  can  be  varied  in  a  single  tokamak, 
such  as  toroidal  magnetic  field  Bt,  plasma  current  7p,  line  averaged  density, 
n,  and  total  injected  power  P.  Interdevice  variables  are  not  varied  on  a  given 
tokamak.  The  major  and  minor  radii  R  and  a  are  examples.  Unfortunately, 
the  plasma  cross  section  ellipticity  e  and  the  isotope  mixture  M  are  frequently 
interdevice  variables  as  well.  We  denote  the  interdevice  variables  as  x0  and 
the  within  device  variables  as  X\. 

16 


Multitokamak  datasets  are  often  poorly  defined  in  the  sense  that  the 
number  of  different  tokamaks,  Nt,  is  only  slightly  larger  than  the  number  of 
intertokamak  variables,  m0.  When  this  occurs,  the  data  covariance  matrix, 
Xf  5Z_1  X  is  usually  close  to  singular. 

When  the  data  covariance  matrix  is  almost  degenerate,  the  GLS  estima- 
tor of  f3  is  ill-conditioned.  This  occurs  because  the  inversion  of  X'  Yl1  X 
multiplies  components  of  X'  ]C_1  y  by  A"1  where  Xj  are  the  eigenvalues  of 
X£  X^_1  X.    Thus  small  errors  in  the  observed  values  of  y  are  multiplied  by 


-l 


Statisticians  refer  to  the  problem  of  the  near  degeneracy  of  the  data  ma- 
trix as  "colinearity".  We  briefly  describe  the  two  most  commonly  used  sta- 
tistical techniques  to  reduce  colinearity  problems. 

Principle  component  analysis  decomposes  the  data  covariance  matrix 
X£  £_1  X  into  E'AE.  The  columns  of  E  are  the  eigenvector  of  the  covari- 
ance matrix  and  A  is  a  m  x  m  diagonal  matrix  whose  entries  consist  of  the 
corresponding  eigenvalues  Xj  of  £j.  The  eigenvectors  £j  are  also  called  the 
principle  components.  The  eigenvalues  Xj  are  equal  to  the  variance  of  their 
respective  eigenvectors. 

Small  eigenvalues  correspond  to  data  directions  which  have  only  been 
slightly  varied  in  the  dataset.  The  projection  of  the  parameter  vector  /3 
onto  these  directions  is  poorly  determined.  If  an  eigenvalue  Am  is  identically 
zero,  then  an  arbitrary  multiple  of  the  eigenvector  em  may  be  added  to  the 
parameter  vector  (3  without  affecting  the  goodness  of  fit. 

When  one  or  more  eigenvalues  Xj  are  almost  zero,  then  an  arbitrary  com- 
bination of  these  eigenvectors  may  be  added  to  ft  while  only  slightly  degrading 
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the  goodness  of  fit.  This  explains  why  a  large  number  of  distinctly  different 
scaling  laws  can  describe  tokamak  data1  with  only  slightly  different  goodnesses 
offit. 

The  k  principle  component  estimator  is  denned  by  setting  the  (m  —  k) 
most  ill  defined  projections  of  0  to  zero.  Mathematically 

${c  =  {E'A^E^'EVy 

where  Aj,  has  the  (tti  —  k)  smallest  eigenvalues,  Xj  set  to  zero. 

Ridge  regression4  is  a  related  method  where  a  small  multiple  of  the  iden- 
tity is  added  to  the  data  covariance  matrix.  Thus  the  ridge  estimator  is 

The  ridge  trace  is  the  plot  of  the  individual  components,  0i(8)  vs.  9  to 
determine  graphically  when  8  is  large  enough  to  stabilize  the  fit  coefficients. 

Both  ridge  regression  and  principle  components  stabilize  the  ill  condi- 
tioned least  squares  estimator  by  shrinking  the  ill  conditioned  components. 
Mathematically,  there  always  exists  a  small  enough  8  such  that  the  ridge 
estimator  has  a  smaller  variance  than  the  least  squares  estimator.  In  other 
words,  the  small  amount  of  bias  introduced  by  8  is  more  than  compensated 
by  the  improvement  in  numerical  stability. 

In  summary,  the  near  degeneracy  of  X'  £-1  X  means  that  many  param- 
eter vectors,  /3  (scaling  laws)  can  describe  the  data.  Ridge  regression  sets 
the  poorly  defined  components  of  (3  to  a  much  smaller  value  than  the  least 
squares  estimator.  Finally,  we  note  that  the  independent  variables  Xi  should 
be  centered  about  the  weighted  mean  value,  x  for  both  principle  components 
and  ridge  regression. 
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III.  Tokamak  to  Tokamak  Variation 

As  discussed  earlier,  almost  all  scaling  studies  assume  that  the  errors  (de- 
partures from  the  fitted  values)  are  independent  from  discharge  to  discharge. 
This  neglects  the  component  of  error  which  remains  constant  on  any  one 
device. 

If  this  tokamak  to  tokamak  variation  is  attributable  to  one  or  more  impor- 
tant factors  such  as  wall  material  or  limiter/divertor  configuration,  statistics 
is  of  little  help  in  analyzing  confinement.  If,  however,  the  tokamak  to  toka- 
mak differences  are  due  to  many  small  factors,  then  it  is  possible  to  analyze 
the  data  by  averaging  over  the  small  factors. 

We  now  present  and  apply  a  slight  generalization  of  Swamy's  random 
coefficient  model.  In  this  model,  we  assume  that  the  scaling  coefficients  for 
each  tokamak,  j3{  have  an  a  priori  distribution  where  E\j3i]  =  /3  and 

E[0i  - 1)0%  -  ff\  =  A** 

where  A  is  an  m  x  m  covariance  matrix. 

Given  the  covariance,  A,  the  mean  parameter  vector  0  may  be  estimated 
using  generalized  least  squares.  The  covariance  matrix  A  may  then  be  esti- 
mated iteratively  from  the  residuals,  $  —  /9. 

The  inter-tokamak  variables,  x0,  are  usually  too  poorly  determined  to  rule 
out  fixed  values  of  the  parameters.  Thus  we  adopt  a  hybrid  model  where  /30 
is  fixed  and  /3j  is  distributed  about  /3j  with  covariance  matrix  Aj.  We  do 
assume  the  overall  scaling  constant  is  random  and  included  in  x\. 

Following  [6],  we  divide  our  design  matrix  into  separate  blocks,  Xj,  for 
each  tokamak.    The  block  Xj  is  then  a  kj  x  (m0  +  raj)  matrix  where  kj  is 
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the  number  of  datapoints  from  the  j-th  tokamak.    Xj  is  then  divided  into 

The  error  matrix  for  the  j-th  block  reduces  to 

£«  =  alii  +  X1,AlXtlt 

The  GLS  estimator  for  J3  is 

Using  results  from  [6],  this  simplifes  to 
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where  E5  =  (J^.Xi,.)"",  Fj  =  kjEjX^.  and  A  =  (r^P^AiP^.   We  have 
used  the  standard  statistical  notation  that  "-"  denotes  the  Moore-Penrose 
generalized  inverse  and  Px    is  the  projection  onto  the  row  space  of  Xj. 
The  covariance  matrix,  A,  may  then  be  iteratively  estimated  from 
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To  take  into  account  the  effects  of  different  x0  for  different  tokamaks  fe-    is 
defined  as 
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We  note  that  the  error  in  our  estimate  of  /3  tends  to  zero  as  the  square 
root  of  the  number  of  tokamaks  not  as  the  square  root  of  the  number  of 
datapoints. 

The  predicted  performance  of  any  new  tokamak,  with  parameters  u  = 
{u0,ui)  is 


y{u)  =  /3  -u± 
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N 


(     vt    \ 
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The  first  term  in  the  square  root  represents  the  uncertainty  in  the  estimate 
of  J3.  The  second  term  accounts  for  the  intrinsic  variability  of  tokamaks  when 
only  the  parameters  in  u  are  specified. 

We  close  this  section  by  noting  that  the  ordinary  least  squares  estimator 
for  f3  does  tend  to  the  correct  value  of  /3  as  the  number  of  tokamaks  increases. 
However  the  GLS  estimator  using  the  random  coefficient  model  is  a  more 
efficient  estimate  of  /3.  Finally  the  error  estimates  for  J3  and  y(u)  even  scale 
incorrectly  when  ordinary  least  squares  is  used. 


IV.  Discussion 

In  no  way  do  we  want  to  pretend  that  a  complex  dynamical  system  such 
as  a  tokamak  is  completely  describable  as  a  linear  system  with  six  or  eight 
degrees  of  freedoms.  We  prefer  to  view  scaling  laws  as  a  Taylor  series  expan- 
sion about  a  particular  point  in  parameter  space,  including  only  the  most 
important  plasma  variables. 

Scaling  laws  describe  subregions  of  parameter  space  where  a  single  loss 
mechanism  or  a  special  combination  of  losses  is  dominant.  Thus  scaling  laws 
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should  not  be  trusted  in  extrapolating  to  new  regions  in  parameter  space. 
Luckily,  scaling  laws  are  most  often  used  for  the  case  when  the  physical, 
dimensionless  variables  are  only  varied  slightly  and  the  major  extrapolation  is 
in  device  size.  Furthermore  in  most  multimachine  databases,  the  component 
of  the  database  which  varies  the  most  is  strongly  associated  with  device  size. 
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Abstract 

The  Swamy  random  coefficient  model  is  generalized  to  the  case 
where  the  covariance  matrices  XjXj  for  each  individual  and  the  dis- 
persion matrix  A  for  the  free  parameters  0  are  singular. 

In  many  physical  or  economic  systems,  the  free  parameters  (3j  vary  from 
individual/individual  device  to  individual  (device).  If  the  differences  in  the 
parameter  vectors  (3d  are  due  to  many  small  differences  in  the  individuals, 
instead  of  one  or  two  significant  latent  variables,  a  statistical  treatment  is 
still  possible.  In  these  cases,  the  random  coefficient  model,1,2  introduced 
by  Swamy,  is  applied.  Swamy  assumes  that  the  parameter  vectors  (3j  are 
randomly  distributed  about  a  mean  vector  0.  More  precisely, 

#  =  /5  +  ik  (1) 

where  E[fii]  =  0,  £[£&']  =  A,  and  £[&,/Ij]  =  0,  i  ^  j. 

In  many  situations,  not  all  independent  variables  can  be  varied  in  the  in- 
dividual experiments.  One  such  problem  is  the  extrapolation  of  performance 
parameters  in  controlled  nuclear  fusion  experiments.3  A  database  consisting 
of  a  large  number  of  datapoints  from  eight  tokamak  experiments  has  been 
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collected.  The  energy  confinement  time  is  assumed  to  be  a  geometric  function 
of  certain  bulk  variables  such  as  plasma  current,  magnetic  field  and  device 
size.  However,  several  of  these  independent  variables,  namely  the  device  size 
and  device  shape,  cannot  be  varied  in  the  individual  experiments. 

In  this  note,  we  generalized  Swamy's  random  coefficient  model  to  the  case 
where  a  number  of  coefficients  can  be  determined  only  through  interindivid- 
ual  comparisons.  We  assume  that  the  z-th  individual  can  be  modeled  as 

where  yi  and  e^  are  (A\  x  1)  vectors  and  e^  is  distributed  E[e{,  e  •']  =  (TuS^Ir.. 
In  contrast  to  the  standard  random  coefficient  model,  we  assume  the 
design  matrix  has  a  block  structure: 

Xi  =  (X0i.,Xli)  (2) 

where  X,,  is  a  ki  x  m0  matrix  and  Xlt  is  a  ki  x  mi  matrix. 

We  further  assume  that  none  of  the  independent  variables  in  X0i  vary 
over  the  i-th  individual.  Thus  we  can  write  X0,  as 

Xo. =j.®x0:  (3) 

where  ji,  is  the  K{  vector  consisting  entirely  of  ones. 

For  many  cases,  including  ours,  the  number  of  individuals  is  only  slightly 
larger  than  m0,  the  number  of  parameters  which  can  only  be  determined 
by  interindividual  comparisons.  Thus  allowing  these  parameters,  f30t  to  be 
random  is  ill  conditioned.    Therefore  we  assume  only  the  parameters,  f}\, 
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describing  Xi  axe  random,  i.e.1 

/ 

A  = 

V 


0      0 

(4) 
0    Aj 


Finally,  we  assume  that  the  intercept  is  a  random  variable,  included  in 
Xj .  This  assusmption  significantly  simplifies  the  resulting  algebra  since  it 
implies  that  the  column  space  of  X,,,,  A/(X0)),  is  contained  in  the  column 
space  of  X\i. 

The  standard  generalized  least  squares  estimator  of  the  mean  parameter 
vector  is 

(  N  \~l    N 

0  =  (Xt2-1X)-lXtVy=  IXX^xA      E«W  (5) 

We  make  one  further  generalization  of  Swamy's1  results  by  assuming  the 
individual  covariance  matrices  X[  Xi    may  be  singular. 

Due  to  the  assumed  block  structure  of  A,  the  covariance  matrix  for  the 
i-th  error  structure,  XJ«  simphfies  to 

Ett  =  <7>/t  +  XnAiX^  (6) 

To  simplify  these  expressions  we  note  that  Rao's  equality  (Ref.  [4],  p. 
33)  generalized  to 

(<r2/  +  XAX')-1  =  *~2[I  -  XEX1]  +  XE{a2E  +  A)~EXl  (7) 

where  E  is  the  Moore-Penrose  generalized  inverse  of  XlX,  E  =  (XlX)~  and 
A  is  the  normalized  projection  of  A  onto  the  row  space  of  X,  A  =  PXAPX. 

We  note  that  XEXt  is  the  projection  operator  onto  the  column  space 
of  X.    Thus  the  first  two  terms  of  eq.    (7)  are  the  projection  perpendicular 
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to  the  column  space  of  X,  P±.   Since  M(X0i)  C  M(Xit),  P±X0i  =  0,  where 
P±,  =U~  XiEX\.  Thus 


->    |E^(X0j,Xlj)  = 
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where  Ej  =  (X{.Xi-)~,  Fj  =  kjEjfeuxJ.).  We  have  used  X„  X^  =  kjX0]xx 
where  x\}  is  the  mean  value  of  ij  for  the  j-th  individual.  Similarly,  X'-Ej  j/j 
simphfies  to 
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where  6~  is  the  generalized  least  squares  estimator  obtained  using  the  Moore- 
Penrose  inverse. 

Thus  the  generalized  least  squares  estimator  for  (3  simphfies  to 

IE  MM^  +  ^n^O      £[  F!      ^Ei  +  LiYbT 


ij=i 


t=i 


(9) 


Swamy's  estimator  for  the  within  individual  variation  remains  unchanged 

(10) 


bl 


Vi'PuVi 


h  -rnu 

where  m\i  is  the  actual  number  of  degrees  of  freedom  in  the  i  th  individ- 
ual regression.  However  the  estimator  for  Ai  becomes  significantly  more 
complicated  when  X\  Xit  is  singular.  The  variance  matrix  is  defined  from 


N 


N 


N 


EQAiQ,  =  -— -  Sb  -  Y,*&Kxu) 


(11) 
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where  Qt  projects  onto  the  column  space  of  X\. Xj    and 


(12) 


1=1 


Our  definition  of  Sb  differs  from  Swamy's  because  we  have  used  our  es- 
timated (3  vector  to  iteratively  define  b~ .  This  is  not  only  a  more  efficient 
estimator,  but  also  necessary  to  take  into  account  the  effects  of  different  x0 
for  different  individuals.  The  precise  definition  of  b~  is 


Qi 


t 

(iY\ 

K+K'  io. 

\ 

\°J) 

(13) 


The  second  term  corrects  the  constant  term  in  bi  for  x0. 

If  A'j'.Yi,  is  never  singular,  the  LHS  of  (11)  reduces  to  NAi.  Unfortu- 
antely,  for  degenerate  cases  we  must  solve  a  m\  x  m\  matrix  problem.  We 
rewrite  eq.   (11)  as 

(JtQi®Qt)&i  =  R  (HO 

where  0  denotes  the  Kroenecker  product  and 

Aj  =  (An,  A2i . . .  Am,i,  A12  •  •  ■  Amimi)  . 

The  resulting  linear  system  may  be  solved  to  yield  an  estimate  for  Aj. 
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APPENDIX: 

Principal  Components  Analysis  for  Degenerate  Uncentered 

Matrices 

We  assume  the  k-th  individual  experiment  is  of  the  form 

x/(l,x     +Xi  ) 

where  x,x%  are  mi  vectors  of  independent  variables  which  can  be  varied  in 
the  individual  experiments.  Let  X?  denote  the  n  x  (ttij  +  1)  data  matrix  and 
Xv  be  the  n  x  rri]  data  matrix  consisting  of  the  last  m^  columns  of  X?.  We 
define  the  normalized,  centered  data  matrix  X  as 

X  =  (Xv  -  lnx    )/y/n 

The  data  matrix  has  the  form 

(  1     *" 

XTXT  =n  I  _      _       t 

\   x    X  lX  +  xx 

We  now  derive  a  simple  expression  for  the  Moore-Penrose  generalized 
inverse  of  XjXj-  when  XlX  is  degenerate. 


Lemma.  Let  E  be  a  t0  X  £0  symmetric  non-negative  definite  matrix  of 
rank  £i  and  v,  w  be  t  vectors.  Assume  v  G  M(E)  and  w  is  orthogonal  to 
M{E).  Let 

fi  =  E  +  {v  +  w)(v  +  wY  , 
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the  Moore-Penrose  generalized  inverse  is 

{wtfElfp   +   EmpVtU1) 


^mp —  EMP 


\w\ 


_      WW 

+   (l  +  tfE^v)  — 


D 


To  determine  the  generalized  inverse  for  non-centered  matrices,  we  set 

\ 


U    ...    0 


E  = 


\ 


1 


v  +  w  = 


I 


i 


0   ...   xlx 

Let  ei, ...  eno  be  Lhe  nonrmalized  null  vectors  of  X'X,  we  define 

-     v-  -      -    (  l 

xj  =  }_  ei  ■  x  ,      w  = 

«=i  ^  Xf 

If  XlX  is  nearly  degenerate,  the  principal  components  regression  can  be 
obtained  by  projecting  out  the  nearly  degenerate  vectors. 

Finally,  the  projection  operator  onto  the  row  space  of  Xt  is 


PXt  =  ET.pE  + 


WW 


\w\ 
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